Search CORE

1,990 research outputs found

A new spiking convolutional recurrent neural network (SCRNN) with applications to event-based hand gesture recognition

Author: Di Caterina Gaetano
Soraghan John
Xing Yannan
Publication venue: 'Frontiers Media SA'
Publication date: 17/11/2020
Field of study

The combination of neuromorphic visual sensors and spiking neural network offers a high efficient bio-inspired solution to real-world applications. However, processing event- based sequences remains challenging because of the nature of their asynchronism and sparsity behavior. In this paper, a novel spiking convolutional recurrent neural network (SCRNN) architecture that takes advantage of both convolution operation and recurrent connectivity to maintain the spatial and temporal relations from event-based sequence data are presented. The use of recurrent architecture enables the network to have a sampling window with an arbitrary length, allowing the network to exploit temporal correlations between event collections. Rather than standard ANN to SNN conversion techniques, the network utilizes a supervised Spike Layer Error Reassignment (SLAYER) training mechanism that allows the network to adapt to neuromorphic (event-based) data directly. The network structure is validated on the DVS gesture dataset and achieves a 10 class gesture recognition accuracy of 96.59% and an 11 class gesture recognition accuracy of 90.28%

University of Strathclyde Institutional Repository

Prosodic feature extraction for assessment and treatment of dysarthria

Author: Di Caterina Gaetano
Ijitona Tolulope
Soraghan John
Yue Hong
Publication venue
Publication date: 21/06/2016
Field of study

Dysarthria, a neurological motor speech disorder caused by lesions to the central and peripheral nervous system, accounts for over 40% of neurological disorders referred to pathologists in 2013[1]. This affects the ability of speakers to control the movement of speech production muscles due to muscle weakness. Dysarthria is characterised by reduced loudness, high pitch variability, monotonous speech, poor voice quality and reduced intelligibility [2]. Current techniques for dysarthria assessment are based on perception, which do not give objective measurements for the severity of this speech disorder. There is therefore a need to explore objective techniques for dysarthria assessment and treatment. The goal of this research is to identify and extract the main acoustic features which can be used to describe the type and severity of this disorder. An acoustic feature extraction and classification technique is proposed in this work. The proposed method involves a pre-processing stage where audio samples are filtered to remove noise and resampled at 8 kHz. The next stage is a feature extraction stage where pitch, intensity, formants, zero-crossing rate, speech rate and cepstral coefficients are extracted from the speech samples. Classification of the extracted features is carried out using a single layer neural network. After the classification, a treatment tool is to be developed to assist patients, through tailored exercises, to improve their articulatory ability, intelligibility, intonation and voice quality. Consequently, this proposed technique will assist speech therapists in tracking the progress of patients over time. It will also provide an acoustic objective measurement for dysarthria severity assessment. Some of the potential applications of this technology include management of cognitive speech impairments, treatment of speech difficulties in children and other advanced speech and language applications

University of Strathclyde Institutional Repository

SpikeSEG : Spiking segmentation via STDP saliency mapping

Author: Di Caterina Gaetano
Kirkland Paul
Matich George
Soraghan John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/09/2020
Field of study

Taking inspiration from the structure and behaviourof the human visual system and using the Transposed Convo-lution and Saliency Mapping methods of Convolutional NeuralNetworks (CNN), a spiking event-based image segmentationalgorithm, SpikeSEG is proposed. The approach makes use ofboth spike-based imaging and spike-based processing, where theimages are either standard images converted to spiking images orthey are generated directly from a neuromorphic event drivensensor, and then processed using a spiking fully convolutionalneural network. The spiking segmentation method uses the spikeactivations through time within the network to trace back anyoutputs from saliency maps, to the exact pixel location. Thisnot only gives exact pixel locations for spiking segmentation,but with low latency and computational overhead. SpikeSEGis the first spiking event-based segmentation network and overthree experiment test achieves promising results with 96%accuracy overall and a 74% mean intersection over union forthe segmentation, all within an event by event-based framework

Crossref

University of Strathclyde Institutional Repository

Perception understanding action : adding understanding to the perception action cycle with spiking segmentation

Author: Di Caterina Gaetano
Kirkland Paul
Matich George
Soraghan John
Publication venue: 'Frontiers Media SA'
Publication date: 19/10/2020
Field of study

Traditionally the Perception Action cycle is the first stage of building an autonomous robotic system and a practical way to implement a low latency reactive system within a low Size, Weight and Power (SWaP) package. However, within complex scenarios, this method can lack contextual understanding about the scene, such as object recognition-based tracking or system attention. Object detection, identification and tracking along with semantic segmentation and attention are all modern computer vision tasks in which Convolutional Neural Networks (CNN) have shown significant success, although such networks often have a large computational overhead and power requirements, which are not ideal in smaller robotics tasks. Furthermore, cloud computing and massively parallel processing like in Graphic Processing Units (GPUs) are outside the specification of many tasks due to their respective latency and SWaP constraints. In response to this, Spiking Convolutional Neural Networks (SCNNs) look to provide the feature extraction benefits of CNNs, while maintaining low latency and power overhead thanks to their asynchronous spiking event-based processing. A novel Neuromorphic Perception Understanding Action (PUA) system is presented, that aims to combine the feature extraction benefits of CNNs with low latency processing of SCNNs. The PUA utilizes a Neuromorphic Vision Sensor for Perception that facilitates asynchronous processing within a Spiking fully Convolutional Neural Network (SpikeCNN) to provide semantic segmentation and Understanding of the scene. The output is fed to a spiking control system providing Actions. With this approach, the aim is to bring features of deep learning into the lower levels of autonomous robotics, while maintaining a biologically plausible STDP rule throughout the learned encoding part of the network. The network will be shown to provide a more robust and predictable management of spiking activity with an improved thresholding response. The reported experiments show that this system can deliver robust results of over 96 and 81% for accuracy and Intersection over Union, ensuring such a system can be successfully used within object recognition, classification and tracking problem. This demonstrates that the attention of the system can be tracked accurately, while the asynchronous processing means the controller can give precise track updates with minimal latency

University of Strathclyde Institutional Repository

A novel decentralised system architecture for multi-camera target tracking

Author: Di Caterina Gaetano
Doshi Trushali
Petropoulakis Lykourgos
Soraghan John J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/10/2016
Field of study

Target tracking in a multi-camera system is an active and challenging research that in many systems requires video synchronisation and knowledge of the camera set-up and layout. In this paper a highly flexible, modular and decentralised system architecture is presented for multi-camera target tracking with relaxed synchronisation constraints among camera views. Moreover, the system does not rely on positional information to handle camera hand-off events. As a practical application, the system itself can, at any time, automatically select the best target view available, to implicitly solve occlusion. Further, to validate the proposed architecture, an extension to a multi-camera environment of the colour-based IMS-SWAD tracker is used. The experimental results show that the tracker can successfully track a chosen target in multiple views, in both indoor and outdoor environments, with non-overlapping and overlapping camera views

University of Strathclyde Institutional Repository

Co-operative surveillance cameras for high quality face acquisition in a real-time door monitoring system

Author: Abd Manap Nurulfajar
Di Caterina Gaetano
Ibrahim M. M.
Soraghan John
Publication venue
Publication date: 01/01/2011
Field of study

A poster session on co-operative surveillance cameras for high quality face acquisition in a real-time door monitoring syste

Crossref

University of Strathclyde Institutional Repository

Deep convolutional spiking neural network based hand gesture recognition

Author: Di Caterina Gaetano
Ke Weijie
Petropoulakis Lykourgos
Soraghan John
Xing Yannan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/09/2020
Field of study

Novel technologies for EMG (Electromyogram) based hand gesture recognition have been investigated for many industrial applications. In this paper, a novel approach which is based on a specific designed spiking convolution neural network which is fed by a novel EMG signal energy density map is presented. The experimental results indicate that the new approach not only rapidly decreases the required processing time but also increases the average recognition accuracy to 98.76% based on the Strathclyde dataset and to 98.21% based on the CapgMyo open source dataset. A relative comparison of experimental results between the proposed novel EMG based hand gesture recognition methodology and other similar approaches indicates the superior effectiveness of the new design

Crossref

University of Strathclyde Institutional Repository

Analysis of deep learning architectures for turbulence mitigation in long-range imagery

Author: Di Caterina Gaetano
Humphreys David
Lamb Robert
Soraghan John
Vint David
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 20/09/2020
Field of study

In long range imagery, the atmosphere along the line of sight can result in unwanted visual effects. Random variations in the refractive index of the air causes light to shift and distort. When captured by a camera, this randomly induced variation results in blurred and spatially distorted images. The removal of such effects is greatly desired. Many traditional methods are able to reduce the effects of turbulence within images, however they require complex optimisation procedures or have large computational complexity. The use of deep learning for image processing has now become commonplace, with neural networks being able to outperform traditional methods in many fields. This paper presents an evaluation of various deep learning architectures on the task of turbulence mitigation. The core disadvantage of deep learning is the dependence on a large quantity of relevant data. For the task of turbulence mitigation, real life data is difficult to obtain, as a clean undistorted image is not always obtainable. Turbulent images were therefore generated with the use of a turbulence simulator. This was able to accurately represent atmospheric conditions and apply the resulting spatial distortions onto clean images. This paper provides a comparison between current state of the art image reconstruction convolutional neural networks. Each network is trained on simulated turbulence data. They are then assessed on a series of test images. It is shown that the networks are unable to provide high quality output images. However, they are shown to be able to reduce the effects of spatial warping within the test images. This paper provides critical analysis into the effectiveness of the application of deep learning. It is shown that deep learning has potential in this field, and can be used to make further improvements in the future

University of Strathclyde Institutional Repository

Modified Capsule Neural Network (Mod-CapsNet) for indoor home scene recognition

Author: Basu Amlan
Di Caterina Gaetano
Kaewrak Keerati
Petropoulakis Lykourgos
Soraghan John J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2020
Field of study

In this paper, a Modified Capsule Neural Network (Mod-CapsNet) with a pooling layer but without the squash function is used for recognition of indoor home scenes which are represented in grayscale. This Mod-CapsNet produced an accuracy of 70% compared to the 17.2% accuracy produced by a standard CapsNet. Since there is a lack of larger datasets related to indoor home scenes, to obtain better accuracy with smaller datasets is also one of the important aims in the paper. The number of images used for training and testing is 20,000 and 5000 respectively, all of dimension 128X128. The analysis proves that in the indoor home scene recognition task the combination of the capsule without a squash function and with max-pooling layers works better than by using capsules with convolutional layers. Indoor home scenes are specifically focused towards analysing capsules performance on datasets whose images have similarities but are, nonetheless, quite different. For example, tables may be present in living rooms and dining rooms even though these are quite different rooms

Crossref

University of Strathclyde Institutional Repository

Automatic misclassification rejection for LDA classifier using ROC curves

Author: Conway Bernard
Di Caterina Gaetano
Lakany Heba
Menon Radhika
Petropoulakis Lykourgos
Soraghan John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

This paper presents a technique to improve the performance of an LDA classifier by determining if the predicted classification output is a misclassification and thereby rejecting it. This is achieved by automatically computing a class specific threshold with the help of ROC curves. If the posterior probability of a prediction is below the threshold, the classification result is discarded. This method of minimizing false positives is beneficial in the control of electromyography (EMG ) based upper-limb prosthetic devices. It is hypothesized that a unique EMG pattern is associated with a specific hand gesture. In reality, however, EMG signals are difficult to distinguish, particularly in the case of multiple finger motions, and hence classifiers are trained to recognize a set of individual gestures. However, it is imperative that misclassifications be avoided because they result in unwanted prosthetic arm motions which are detrimental to device controllability. This warrants the need for the proposed technique wherein a misclassified gesture prediction is rejected resulting in no motion of the prosthetic arm. The technique was tested using surface EMG data recorded from thirteen amputees performing seven hand gestures. Results show the number of misclassifications was effectively reduced, particularly in cases with low original classification accuracy

Crossref

University of Strathclyde Institutional Repository